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We use LINUS, a procedure developed by Srinivasan and 
Rose, to provide a physical interpretation of and to predict 
the secondary structures of proteins. The secondary struc- 
ture type at a given site is identified by the largest confor- 
mational bias during short time simulations. We examine 
the rate of successful prediction as a function of temperature 
and the interaction window. At high temperatures, there is 
a large propensity for the establishment of /3-strands whereas 
a-helices appear only when the temperature is lower than a 
certain threshold value. It is found that there exists an op- 
timal temperature at which the correct secondary structures 
are predicted most accurately. We find that this temperature 
is close to the peak temperature of the specific heat. Chang- 
ing the interaction window or carrying out longer simulations 
approaching equilibrium lead to little change in the optimal 
success rate. Our findings are in accord with the observa- 
tion by Srinivasan and Rose that the secondary structures 
are mainly determined by local interactions and they appear 
in the early stage of folding. 

Keywords: protein folding; secondary structures; 
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INTRODUCTION 

A knowledge of the three dimensional structure of a 
protein is crucial for understanding its biological func- 
tionality. Unfortunately, the rate at which protein struc- 
tures can be experimentally solved is far behind the speed 
at which the sequences are determined. With progress 
in the Human Genome Project, a good computer-based 
method for the prediction of protein structures from their 
sequences would be an invaluable tool for modern micro- 
biology as well as for drug design. The existing methods 
for structure prediction can be divided into two classes: 
1) template-based methods which compare a sequence 
with unknown structure against the library of solved 
structures and 2) ab initio methods which seek to identify 
the native fold usually defined as the lowest energy point 
in conformational space. The latter are specially useful 
when a target sequence has a low similarity with the ex- 
isting protein sequences of known structures. It should 
be noted though that many so called ab initio methods 
do use information derived from the protein database as 
input. 

Significant progress has been achieved in the ab ini- 
tio approach to protein structure prediction as witnessed 



in the GASP competitionsjila wherein the structures 
of large protein fragments, comprising as many as 100 
residues, were predicted with an accuracy of 4-7 A in 
rmsd. A notable success reported was that of the Baker 
group and entailed the assembly of protein conforma- 
tions from fragments of known structures in the protein 
database, which have local sequences similar to that of 
the target, sequence, using statistically derived scoring 
functionsM~l3 In Levitt's approachjju secondary struc- 
tures, which were predicted by using seiz^etal existing sec- 
ondary structure prediction methods,ErU are fitted to 
best scoring compact conformations obtained on a sim- 
plified tetrahedral lattice. Scheraga and coworkersE3E£l 
use an off-lattice Ga-based model with interactions im- 
posed on virtual side-chains and virtual peptide groups. 
The lowest-energy Ga trace obtained by extensive con- 
formational space annealing is then converted to an alL 
atom backbone for further refinements. Skolnick et al.O 
built discretized protein conformations using predicted 
secondary structures and a number of tertiary restraints 
derived from multiple sequence alignments. The suc- 
cess of these ab initio methods relies to a large extent 
on knowledge-based information, i.e. data derived from 
known protein structures, such as that used in the scor- 
ing functions, secondary structure prediction or in the 
choice of fragments to incorporate in the model. 

Our work deals with secondary structure prediction 
and builds on a truly ab initio protein structure pre- 
diction procedure called LINUS developed by Srinivasan 
and Rose.ll3 LINUS does not use any knowledge-based 
information and thus provides a clear picture of the role 
played by the different factors in folding. Furthermore, 
the algorithm for determining the structure is not based 
on energy minimization - LINUS captures the interplay 
between energy and entropy in determining the local sec- 
ondary structure. 

The most powerful aspect of LINUS is its simplicity - 
it is based on just 4 essential aspects of protein behav- 
ior: (1) excluded volume, (2) preferred occupancies of 
the dihedsal angles in certain regions in the Ramachan- 
dran plot,llZl (3) hydrophobic interactions and hydrogen 
bonding a]ad|-(4) the hierarchical organization of protein 
structures Jlalij In spite of this simplicity, LINUS has al- 
ready proved to be effective in predicting the secondacM 
and super-secondary structures of protein fragments.llj 
Note that the hierarchical algorithm steers folding along 
some specific pathways and the resulting structure does 
not necessarily correspond to the global energy minimum. 
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In a subsequent study,li3 Srinivasan and Rose used LI- 
NUS to propose a physical basis for secondary struc- 
tures, which showed that protein secondary structures 
are mainly determined by steric effects and local interac- 
tions. This conclusion recently obtained strong support 
from experimental evidence that unfolded protein con- 
formations, under highly denaturing conditions and thus 
in the absence of long-range contacts—are still charac- 
terized by local native-hke topologyEJ'El In LINUS, the 
conformational bias towards a type of secondary struc- 
ture is determined through the probability of being in 
this conformation during simulation. 

We have found this idea to be intriguing and worthy of 
a careful reexamination. Here, we make an assessment of 
how well the secondary structures can be predicted based 
on the analysis of conformational biases. In particular we 
concentrate on the role played by the temperature, T, in 
determining the success rate and find that there is an 
optimal T at which the secondary structure prediction is 
the best. For most of the proteins studied, this temper- 
ature coincides with the one that Rose and Srinivasan 
used in their studies and is found to be near the peak 
in the specific heat where the conformational conversion 
in the system is the largest. The optimal conditions for 
the structure prediction do not depend much on whether 
the window in the interactions allowed for purely local 
or also for non-local interactions. They are also insensi- 
tive to the duration of the simulations. We obtained very 
similar results when long, nearly equilibrium, simulations 
were considered. 

The aim of our study is to elucidate how LINUS works 
and what its strengths and weaknesses are. The ultimate 
goal would be to determine what kinds of improvements 
could be made in this physically appealing framework to 
move towards first principles tertiary structure predic- 
tion. 



METHODS 

A detailed description of LINUS canJie, found in the 
original papers of Srinivasan and Rose.liSEj We have de- 
veloped our own version of LINUS that strictly follows 
the ii»*)roved development as described in the PNAS 
paperE^. Briefly, in LINUS, the coordinates of all back- 
bone atoms are considered whereas a sidechain is repre- 
sented in a simplified manner. Specifically, glycine has 
no sidechain, alanine's sidechain is made of a Cp and 
the remaining amino acids are represented by C/3 and 
one or two pseudo atoms, depending on whether the 
sidechain is branched out or not. The atoms are mod- 
eled as hard spheres that are not allowed to overlap. The 
sizes of the spheres depend on the type of the atom and 
the sizes of the pseudo-atoms depend on the size of the 
sidechains that they represent. 

Apart from steric interactions, the Hamiltonian con- 
sists of just a few terms that provide attraction between 



atoms: hydrogen bonding (H-bond), hydrophobic inter- 
action, and salt bridges. All backbone nitrogens, except 
for those that belong to a proline, are considered to be 
H-bond donors and participate in no more than one H- 
bond but the nitrogen at the N-terminus may participate 
in up to three H-bonds. The backbone oxygens and the 
sidechains of some amino acids (Ser, Thr, Asn, Asp, Gin, 
Glu) are acceptors. A backbone-to-backbone hydrogen 
bond is assumed to be formed between residues i and j 
when they are at least three residue apart in the sequence 
and when the distance between a donor and an acceptor 
is smaller than SA. An energy of — 0.5e is assigned, where 
e is an energy unit, and the energy is scaled quasi- linearly 
from to its minimal value as the distance decreases to 
3.5A. It is also required that the out-of-plane dihedral 
angle 0(j)-N(i)-Ca(i)-C(i — 1) should be larger than 
140°. A sidechain-to-backbone hydrogen bond is formed 
when the donor-to-acceptor distance is smaller than AA 
and the acceptor must be not further than four residues 
away from the donor in the sequence. In this case an 
energy of — l.Oe is assigned and no scaling of the energy 
is involved. 

Hydrophobic attraction is postulated to occur for con- 
tacts between the sidechain atoms of hydrophobic (Cys, 
He, Leu, Met, Phe, Trp, Val) and amphipathic (Ala, His, 
Thr, Tyr) residues. The minimal value of the contact 
energy is — 0.5e when both residues are hydrophobic and 
— 0.25e when one of them is hydrophobic and the other 
is amphipathic. A contact between two atoms i and j is 
said to form when the distance between them is smaller 
than R{i) + R{j) + 1.4A, where R{i) and R{j) are the 
contact-radii of the two atoms. The contact radii of the 
atomaij depend on the kind of atoms and are larger than 
their hard sphere radii. The energy of a contact scales 
from to its minimal value as the distance between two 
atoms decreases from its cut-off value to R{i) + R{j)- 
A salt bridge is assigned to contacts between oppositely 
charged groups (namely the sidechains of Arg or Lys with 
Glu or Asp). The minimal energy of a salt bridge is 
— 0.5e. In LINUS there is also an energy function to chase 
residues away from the right hand side of the Ramachan- 
dran plot. When a residue has a positive torsional angle 
(j) it is punished with an energy of —l.Oe if the residue is 
not a glycine, otherwise it is rewarded with an energy of 
-l.Oe. 

The main degrees of freedom used in LINUS are the 
Ramachandran torsional angles cj) and tjj and the torsional 
X which corresponds to rotation of the sidechains. Ad- 
ditionally the torsional angle lu about the peptide bond 
and the N-Cq-C bond angle are allowed to be perturbed 
slightly during the simulation. All the other bond an- 
gles and bond lengths are kept fixed. Three consecu- 
tive residues -\- l^i + 2) are perturbed at a time and 
the movements advance from the N-terminus to the C- 
terminus. The moves at an «th residue are repeatedly 
chosen until a move is obtained in which there are no 
steric clashes within the three residue fragment consid- 
ered. At the next stage the whole protein chain is checked 
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for the presence of steric clashes. Up to 50 such attempts 
are performed in order to find a conformation without 
any steric clashes. If the new conformation found still 
has steric clashes it is rejected, otherwise it is accepted 
with a probability V = min {l, e-'^E/ksT^ ^ where ks is 
the Boltzmann constant, T is the temperature measured 
in the units of e/fcs, and AE is the energy difference. A 
complete progression from N to C is called a cycle. 

LINUS uses a smart move set that consists of the fol- 
lowing, equally probable, move types 

1. a-helix : three consecutive residues (i — 1, z, z-l- 1) are 

set to having cj) = -64 ± 7°, V = -43 ± 7°. 

2. /3-strand : three residues (i — + 1) are set to 

having = -130 ± 15°, V = 135 ± 15°. If a residue 
is a proline then is reset to 70 ± 15°. 

3. turn : there are 4 types of turns, namely I, F, II and 

IF. For each turn type there are two possibilities: 
a) setting residues {i — to have turn (j) and 
ip values while residue i + 1 is set to random coil 
and b) setting i — \ to random coil and + 1) 
to have turn (p and ip values. Overall there are 8 
such possibilities. The turn (f) and ^ values for two 
consecutive residues are given below for each type 
of a turn move (the notations used for the residues 
are i — 1 and i but they can also be i and i + 1). 

i. Type I: 

residue [i-l): <j) ^ -60 ±15°, = -30 ±15° 
residue <p = -90 ± 15°, V = ± 15° 

ii. Type F: 

residue {i-l): = 55 ± 15°, -0 = 40 ± 15° 
residue {i): = 80 ± 15°, ?/' = 5 ± 15° 

iii. Type II: 

residue (i-l): <p = -60 ± 15°, -(/> = 110 ± 15° 
residue [i): = 90 ± 15°, ip ^ ± 15° 

iv. Type IF: 

residue {i-l): = 60 ± 15°, ip = -120 ± 15° 
residue {i): <p = -80 ± 15°, V = ± 15° 

For all turn moves if a residue is a proline then its 
(j) is reset to 70 ± 15°. 

4. random coil : and -0 are chosen randomly in 

one of the favorite regions of the Ramachandran 
plot. For non-glycine and non-proline residues 
(<^,'0) G {(-135±45°, 135±45°),(-75±30°,-30± 
30°), (75 ± 15°, 30 ± 15°)}. For glycine (/) G {90 ± 
30°, 180 ± 30°} and V e {0 ± 30°, 180 ± 30°}. For 
proline = -70 ± 15° and ?A G {135 ± 30°, -45 ± 
30°}. 

For the first three move types, u) = 180 ± 5° whereas for 
the coil move uj ~ 180 ± 10°. For all move types, the 
sidechain torsional angles (xs) are chosen at random in 
10° windows around -60°, 60° and 180°. 



The conformational bias, P, of a given type of sec- 
ondary structure is defined as the probability of being in 
this structure during the simulation. P is usually com- 
puted as a function of residue in the sequence. The com- 
putation of P requires a procedure of secondary struc- 
ture assignment, which allows one to determine which 
type of secondary motifs a residue belongs to at a given 
instant. We use an assignment procedure-in the most 
recent unpublished development of LINUS which pro- 
ceeds through the following steps: 

1. Set all residues to the coil conformation (c). 

2. For i running from 1 through iV — 3, where N is the 

number of residues, compute the torsion Q between 
four consecutive Ca's («, i + 1, z + 2, i + 3). 

a. If |e| > 135° then residues {i + 1) and {i + 2) 

are set to the strand conformation (s). 

b. If 45° < e < 65° then residues (i + 1) and {i + 2) 

are set to the helix conformation {h). 

c. If -50° < e < 45° then residue (i + l) and {i + 2) 

are set to the turn conformation {t). 

3. Check again all residues from 1 through N: 

a. If a segment of less than 5 residues with a h 

assignment is found then all residues in this 
segment are set to t. 

b. If a segment of less then 3 residues with a s 

assignment is found then all residues in this 
segment are set to c. 

To compute P, one starts from an open conformation 
and makes a simulation of 1000 cycles. After each cycle a 
conformation assignment is determined to gather statis- 
tics on P. The average is taken over 10 simulations for 
each T. 

In order to mak©, comparisons witb-the DSSP-based 
native assignmentaSa used in the PDB^ we adopt a sim- 
plified correspondence in which the 3io, tt and a- helix 
correspond to h, the isolated /3-bridges and extended /3- 
strands to s, the hydrogen bonded turn to t, and bends 
and undefined segments to c. It should be noted that 
the native state secopdary structure assignment used by 
Srinivasan and RoseE3 for the proteins studied does not 
fully agree with the one used in the PDB. In the follow- 
ing, our results are benchmarked against the PDB-based 
assignment. 

In order to explore the role of local and non local inter- 
actions we consider two choices for the interaction win- 
dow, A, of 6 and N. The interaction window restricts 
interactions along the sequence. A = 6 means that all 
interactions between two residues i and j with \i— j\ > 6 
are switched off, whereas in the case of A = all inter- 
actions are present. 
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RESULTS AND DISCUSSION 

We begin our discussions with protein G (PDB code 
IGBl). the protein showing the best conformational bi- 
ases towards native secondary structures in the set of 
proteins studied by Srinivasan and Rose. Figure 1 shows 
P as a function of residue at three different tempera- 
tures T = 0.8e/kB, O.be/kB and 0.2t/kB. T = O.be/ks is 
the temperature which Srinivasan and Rose used in their 
simulations. The interaction window is set to 6. The 
conformational biases towards a-helices (h), /3-strands 
(s), turns (t) and coils (c) are shown. Note that at 
T = 0.8e/kB the strands dominate over all other struc- 
tures. Thus the whole protein chain prefers to be in the 
strand conformation at high temperatures. This follows 
from the simple observation that the entropy is largest 
in the strand conformation and is the dominant factor in 
the free energy at high temperatures. At low tempera- 
tures, such as T = 0.2e/kB, the bias towards the strands 
vanishes while the highest biases belong to helices and 
turns. This is because helices and turns involve favor- 
able interactions, which are predominantly local and are 
thus stabilized at low temperatures. At the intermedi- 
ate temperature, T = O.Se/fcs, the dominating structure 
varies as one proceeds along the sequence. Some parts of 
the protein prefer to be in a strand conformation while 
others form helices and turns. 

Because the biases strongly depend on T, one may ask 
what the temperature is at which the native secondary 
structure can be most reliably predicted. In order to 
answer this question we have carried out an extensive 
analysis of the biases over a wide range of temperatures. 
Figure 2a shows two sequences of secondary structure 
assignments. The first corresponds to the known native 
conformation of protein G, and the second is obtained 
from the biases given in the middle panel of Figure 1, i.e. 
at T = 0.5e/kB- In the latter case an assignment at a 
given site is set to the type of secondary structure show- 
ing the highest bias. We introduce a parameter rj which 
estimates overlaps between the two sets of assignments 
for each kind of secondary structure. For a given type of 
conformation, x {x € {s, h, t, c}), r] is defined as the num- 
ber of sites at which both assignments (from PDB and 
from the biases) are x divided by the number of sites at 
which at least one of the assignments is x. Specifically 
if A is a set of sites of type x in the PDB assignment 
and B is a set of sites of the same type of conformation 
predicted by the biases then 
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where /(A) is a function which returns the number of 
elements in X. Thus, ii A = B then r/ = 1. We call rj 
the rate of successful prediction. 

Figure 2b shows rj as function of T for the strands, 
helices and turns for protein G. Note that the values of 



T] are the largest around T = O.Se/fcs. At this tempera- 
ture the rate of prediction is the highest for helices and 
exceeds 90% (it is 100% at T = O.SSe/fce), while strands 
and turns are predicted at 74% and 23% levels respec- 
tively. The values of rj were obtained with the reference 
to PDB assignment. If the Srinivasan and Rose assign- 
ment is used instead, the corresponding success rates are 
90%, 63% and 35% respectively. As T increases the rate 
for the strands first decreases and then remains roughly 
at a constant value while the rate for helices drops rapidly 
and vanishes at T = O.Te/fcs. A nearly opposite scenario 
is observed as T becomes smaller than 0.5 - the rate for 
strands drops rapidly while it remains high for the he- 
lices. 

Figure 3 shows as a function of T for protein G but 
as calculated with A = N. We still observe the same 
picture as for A = 6, except that at low temperatures the 
prediction rates for helices and strands become somewhat 
higher. The optimal temperature, however, remains close 
to 0.5e//cB. 

Figures 4 and 5 show r] as functions of T for 6 other 
proteins in the set studied by Rose for A = 6 and N 
respectively. For A = 6, as in protein G, the best pre- 
diction rates are obtained at about T = O.be/kB for all 
of the proteins except for plastocyanin (6PCY) and myo 
hemerythrin (2HMQ). The latter proteins are special be- 
cause they consist of only one kind of secondary structure 
in addition to the turn. The native conformation of plas- 
tocyanin is built only of /3-strands and that of hemery- 
thrin is a four-helix bundle. The best prediction rates are 
obtained for a range of temperatures which corresponds 
to T > 0.4e//cB for plastocyanin and to T < OAe/kB for 
hemerythrin. For A = N, the optimal temperature varies 
a little but for most of the proteins it remains in the range 
from OAe/kB to O.Ge/fcs. It should also be noted that, 
when A = A^, the rates for the strands at low tempera- 
tures become significantly larger than in the A 6 case 
for all proteins. The reason for this behavior is that, at 
low T, the strands can be stabilized only by non-local in- 
teractions, which are absent when A = 6. The results in 
Figures 2 through 5 are generally similar even when the 
Srinivasan-Rose secondary structure assignment is used 
underscoring the robustness of our results. The only dif- 
ference is that the predictions pertaining to the turns are 
improved compared to the PDB-based secondary struc- 
ture assignment. 

What is the principle that governs the choice of the 
optimal temperature? Figure 6 shows how conforma- 
tional changes occur with respect to temperature for each 
residue of protein G for the case of A = 6. It is seen 
clearly that most of the strands are destabilized at low 
temperatures whereas helices are absent at high tempera- 
tures. Thus in order to have both kinds of structures pre- 
dicted the optimal temperature for the prediction should 
be in a range of intermediate temperatures where helices 
have started to form but strands have not vanished. It 
can be seen in Figure 6 that as the temperature is low- 
ered, the strands undergo a transition to helices or other 
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kinds of structures such as turns or coils. A hehx can also 
be formed from a coil as the temperature continues to de- 
crease. Because helices and turns are associated with the 
establishment of H-bonds while strands and coils usu- 
ally have no contacts such transitions entail a change in 
energy which is reflected in the specific heat. This sug- 
gests that the optimal temperature for secondary struc- 
ture prediction ought to be in the vicinity of the peak of 
the specific heat, and likely a bit higher than the temper- 
ature of its maximum in order not to discriminate against 
the strands. 

The connection between the thermodynamics of the 
system and the optimal temperature for the prediction 
are shown in Figures 7 and 8 for protein G and plasto- 
cyanin and hcmcrythrin respectively. The specific heat 
C as a function of temperature is calculated using the 
histogram technique.Lj For each protein we performed a 
long simulation of 200 000 cycles at T = 0.5e/fcs to de- 
duce the thermodynamic behavior at that temperature 
and other temperatures in its vicinity. The results are 
shown for the two values of A. For A = 6 the maxi- 
mum in C occurs roughly at T = QAe/kB for the three 
proteins. For A = the magnitude of the peak in C is 
higher and it also occurs at a slightly higher temperature 
due to the presence of the long range interactions. (It 
is interesting to note that the-experimentally determined 
specific heat of plastocyanincJ shows a sharp maximum 
around 68°C.) Note that for protein G the optimal tem- 
perature for the secondary structure prediction is found 
in the vicinity of the peak in the specific heat - just to the 
right of the maximum. A similar behavior is observed in 
the case of plastocyanin except that the optimal temper- 
ature at A = 6 is farther from the maximum in C . This is 
due to the fact that plastocyanin consists only of /3-sheets 
and the strands are more favored at high temperatures. 
However, in the case of myo hemerythrin, whose native 
state contains mainly a-helices, the behavior is just the 
opposite. The optimal temperature for the prediction is 
now on the low temperature side of the peak in the spe- 
cific heat. A comparison of Figures 4 and 5 shows that 
at the temperature corresponding to the maximum in C, 
the rates of successful prediction are already close to their 
best values. 

The results described so far are based on short time 
simulations which last for 1000 cycles at each T . Figure 
9 shows rj as function of T for protein G when the confor- 
mational biases are determined from nearly equilibrium 
simulations. These simulations are performed similar to 
the calculation of the specific heat: we make a long simu- 
lation of 200 000 cycles at T = Q.be/ks and then use the 
histogram method to obtain the biases at other tempera- 
tures. The profiles of 77 over T are surprisingly similar to 
those shown in Figure 3 and the peaks seem to be even 
more pronounced. The predictions given by the confor- 
mational biases are found to be insensitive to the length 
of simulations, at least for a range of temperatures which 
are close to the optimal value. 



CONCLUSIONS 

Our results confirm that the analysis of conformational 
biases is a fast and useful tool to get information about 
native protein secondary structures. We find that the 
most common secondary motifs, a-helices and /3-strands, 
can be predicted with an accuracy ranging from roughly 
40% up to 100%. Our analysis shows that while the 
rate of successful prediction is insensitive to the interac- 
tion window as well as to the length of the simulations, 
the choice of temperature appears to be critical. The 
optimal run temperature is found to be related to the 
peak temperature in the specific heat. Unlike commonly 
used algorithms in which one attempts to minimize an 
energy function to determine the native state structure, 
LINUS is an algorithm that relies on a delicate interplay 
between the entropy favoring the strands and energetic 
considerations favoring turns and helices. Because there 
is no procedure in LINUS that allows for an assembly 
of strands through the appropriate non-local interactions 
into a sheet, secondary structure prediction in essence de- 
pends on the persistence of a strand conformation down 
to intermediate temperatures in regions corresponding to 
strands in the native structure, while other regions adopt 
the helix and the turn conformations due to the energy 
gain through the local contacts. An improvement in the 
prediction might be expected on extending the "Local 
Independently Nucleated Units of Structure" to some ju- 
diciously chosen non-local interactions for the assembly 
of /3-sheets. 
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FIGURE CAPTIONS 

Fig. 1. Conformational biases towards secondary struc- 
tures, P, as functions of residues determined for 
protein G at three different temperatures T = 
O.Se/fcs, 0.5e/kB and 0.2e/kB- The types of sec- 
ondary structures are denoted as h for a-hehces 
(continuous hue), s for /3-strands (dotted line), t 
for turns (dashed hne) and c for coils (long-dashed 
line). The biases are computed by averaging over 
10 trajectories each of 1000 cycles starting from an 
open conformation. The interaction window is set 
to 6. 

Fig. 2. a) The assignment of the secondary structures 
for protein G extracted from the PDB structuxe. 
using the DSSP method of Kabsch and SanderEEl 
(box) and predicted from the analysis of the con- 
formational biases at T = Q.5e/kB and for A = 6. 
b) The rate of success in prediction, rj, as a function 
of temperature for three kinds of secondary struc- 
tures: helix (/i), strand (s) and turn {t). The sim- 
ulations are performed for protein G with A = 6. 
At each temperature studied, the conformational 
biases are computed by averaging over 10 trajec- 
tories each of 1000 cycles starting from an open 
conformation. The error bars are determined from 
three simulations at each temperature. 

Fig. 3. The rate of success in prediction, 77, as a function 
of temperature for three kinds of secondary struc- 
tures: helix (h), strand (s) and turn (t) for protein 
G with A = A'^ or with no restriction on the range 
of interactions. The details are the same as in the 
lower part of Figure |^. 

Fig. 4. The rate of successful prediction of secondary 
structures as a function of temperature for plasto- 
cyanin (6PCY), myo hemerythrin (2HMQ), staphy- 
lococcal nuclease (ISTG), ubiquitin (lUBQ), ri- 
bonuclease A (7RSA) and ribonuclease H (2RN2). 
The simulations are performed in the same way as 
for protein G as described in the caption of Figure 
Ij. The interaction window is set equal to 6. 

Fig. 5. Same as Figure ||but with A — N. 

Fig. 6. Top: Conformational diagram plotted as a func- 
tion of temperature and residue in protein G. The 
dark and light grey areas correspond to the helix 
(h) and strand (s) conformations respectively. In 
these calculations, A = 6. Bottom: strand (thin 
box) and helix (thick box) fragments found in the 
native conformation of protein G. 

Fig. 7. The specific heat as a function of temperature 
for protein G. The thermodynamic averages were 
carried out by performing a long simulation of 
200 000 cycles at T = O.Se/fcs and then using the 
histogram method to extract quantities at other 



temperatures. The lower peak (continuous line) 
and the higher one (dashed line) correspond to the 
interaction window equal to 6 and A^ respectively. 
The arrows show the temperatures at which the 
secondary structures are best predicted for the two 
values of A. 

Fig. 8. Same as Figured but for plastocyanin (top) and 
myo hemerythrin (bottom). 

Fig. 9. Same as Figure ^ but the rates of successful pre- 
diction are computed from an equilibrium simula- 
tion. The secondary structure biases as a function 
of temperature are computed using the histogram 
method. A long simulation of 200 000 cycles at 
T — 0.5e/kB is performed to extract quantities at 
other nearby temperatures. 
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